Memory optimized dists_add_symmetric #18

KushnirDmytro · 2023-04-11T21:08:16Z

I am proposing PR that fixes the old issues that mention 'CUDA out of memory error' upon running the evaluation script.

I figured out that this issue comes from a single function;
It is cosypose.lib3d.distances.dist_add_symmetric

It allocates tensors of sizes NxNx3 and NxNx1, where N is the number of points.

Yet the same could be achieved by rewriting the code a little bit.

Alternative solutions are also possible and working(tested).

converting tensors to float16
computing the distances pointwise (too slow, on CPU it is even slower)
computing in batches of points (another parameter, still slower)
Those approaches are tested but not used in the current proposal. Yet, they could be added later if the issue arises again with larger point clouds or the requirement to run on constrained hardware.

Also, some distance functions from lib3d.symmetric_distances.py file could be optimized, as they compute similar distance functions.

This solution uses <0.25 of the original version's memory:
An experiment was performed on run_cosy_pose_eval.py pipeline.
Evaluated 30 objects from tless.bop version of the dataset.

The experiment with RTX-2080(8Gb) was not clean because GPU was also used for system GUI runtime.
The scenario with TITAN-X(12Gb) was much cleaner - performed on a headless server.
The old version of the code fails on both setups, while the new one works on both.

The low threshold on use cases could be explained by memory usage for context data and fragmentation. The error is triggered by the requirement to allocate one very large contiguous Tensor. This PR fixes the

KushnirDmytro · 2023-04-11T21:28:58Z

cosypose/lib3d/distances.py

-    dists = dists[ids_row, assign, ids_col]
-    return dists
+    distances = torch.cdist(TXO_gt_points, TXO_pred_points,
+                            p=2, compute_mode='donot_use_mm_for_euclid_dist')


compute_mode='donot_use_mm_for_euclid_dist' is the important parameter value.
Checked on the debug data instance: for the default mode, it gives ~3% of points got a different closest point id.
Yet the difference is about 1e-4--1e-6, but the performance benefit is rather small.

nim65s · 2023-04-12T09:27:17Z

Thanks @KushnirDmytro for this work !

Maybe @ElliotMaitre you could test this to double check ?

ElliotMaitre · 2023-05-04T14:56:27Z

I tested it, it works for me. This change allows to run the evaluation on tless dataset, on a RTX-3060 (12Gb). However, on the ycbv dataset, I still have the memory issue with CUDA out_of_memory. All in all, the improvement is still very noticeable !

Thank you for your contribution

KushnirDmytro · 2023-05-16T09:21:10Z

@ElliotMaitre
Thank you for both: review and appreciation)

After your comment, I was puzzled by a reported YCBV dataset problem.
I downloaded it (with cosypose download script, the exact proposed version of the dataset), then successfully ran the evaluation (as proposed on the landing page of this repo) -- On RTX2080 with 8G, it works fine, memory consumption is modest.

Then checked the data:

TLESS has 30 objects. Point cloud size varies: from 4145 to 20801 pts in each.
YCBC (bop) has 21 objects. Point clouds are standardized: only 2621 pts in each (one object has 2620 to be precise).

It feels like you had CUDA out_of_memory issue for a different reason.
I observed several times when eval (or other script) is terminated during the ongoing computations, often the process hangs in the background and occupies GPU memory. I have a hypothesis, that you launched eval on YCBC while having a zomby TLESS-eval process in the background. This is a reproducible scenario, I checked that.

Memory optimized dists_add_symmetric

539672c

KushnirDmytro commented Apr 11, 2023

View reviewed changes

KushnirDmytro marked this pull request as ready for review April 11, 2023 21:29

KushnirDmytro mentioned this pull request Apr 11, 2023

RunTimeError: CUDA out of memory // Requirements on Graphic card? ylabbe/cosypose#25

Open

nim65s merged commit 1e45363 into Simple-Robotics:master May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory optimized dists_add_symmetric #18

Memory optimized dists_add_symmetric #18

KushnirDmytro commented Apr 11, 2023 •

edited

Loading

KushnirDmytro Apr 11, 2023 •

edited

Loading

nim65s commented Apr 12, 2023

ElliotMaitre commented May 4, 2023

KushnirDmytro commented May 16, 2023

Memory optimized dists_add_symmetric #18

Memory optimized dists_add_symmetric #18

Conversation

KushnirDmytro commented Apr 11, 2023 • edited Loading

KushnirDmytro Apr 11, 2023 • edited Loading

Choose a reason for hiding this comment

nim65s commented Apr 12, 2023

ElliotMaitre commented May 4, 2023

KushnirDmytro commented May 16, 2023

KushnirDmytro commented Apr 11, 2023 •

edited

Loading

KushnirDmytro Apr 11, 2023 •

edited

Loading